{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Guaranteeing valid output syntax\n",
    "\n",
    "Large language models are great at generating useful outputs, but they are not great at guaranteeing that those outputs follow a specific format. This can cause problems when we want to use the outputs of a language model as input to another system. For example, if we want to use a language model to generate a JSON object, we need to make sure that the output is valid JSON. This can be a real pain with standard APIs, but with `guidance` we can both accelerate inference speed and ensure that generated JSON is always valid.\n",
    "\n",
    "This notebook shows how to generate a JSON object we know will have a valid format. The example used here is a generating a random character profile for a game, but the ideas are readily applicable to any scenario where you want JSON output."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "d7ccafe4314b4b1e83ff21c054646977",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "gpustat is not installed, run `pip install gpustat` to collect GPU stats.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "fb52cdbf434c43a1a74deae1ded5440a",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "StitchWidget(initial_height='auto', initial_width='100%', srcdoc='<!doctype html>\\n<html lang=\"en\">\\n<head>\\n …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import guidance\n",
    "\n",
    "# Define the model we will use\n",
    "# lm = guidance.models.LlamaCpp(\"/path/to/model.gguf\", n_gpu_layers=-1)\n",
    "lm = guidance.models.Transformers(\"microsoft/Phi-3-mini-4k-instruct\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "1d959bace40b45f4bd853f717bc215e5",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "StitchWidget(initial_height='auto', initial_width='100%', srcdoc='<!doctype html>\\n<html lang=\"en\">\\n<head>\\n …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from guidance import gen, select\n",
    "\n",
    "# we can pre-define valid option sets\n",
    "sample_weapons = [\"sword\", \"axe\", \"mace\", \"spear\", \"bow\", \"crossbow\"]\n",
    "sample_armor = [\"leather\", \"chainmail\", \"plate\"]\n",
    "\n",
    "# define a re-usable \"guidance function\" that we can use below\n",
    "@guidance\n",
    "def quoted_list(lm, name, n):\n",
    "    for i in range(n):\n",
    "        if i > 0:\n",
    "            lm += \", \"\n",
    "        lm += '\"' + gen(name, list_append=True, stop='\"') + '\"'\n",
    "    return lm\n",
    "\n",
    "@guidance\n",
    "def generate_character(\n",
    "    lm,\n",
    "    character_one_liner,\n",
    "    weapons: list[str] = sample_weapons,\n",
    "    armour: list[str] = sample_armor,\n",
    "    n_items: int = 3\n",
    "):\n",
    "    lm += f'''\\\n",
    "    {{\n",
    "        \"description\" : \"{character_one_liner}\",\n",
    "        \"name\" : \"{gen(\"character_name\", stop='\"')}\",\n",
    "        \"age\" : {gen(\"age\", regex=\"[0-9]+\")},\n",
    "        \"armour\" : \"{select(armour, name=\"armor\")}\",\n",
    "        \"weapon\" : \"{select(weapons, name=\"weapon\")}\",\n",
    "        \"class\" : \"{gen(\"character_class\", stop='\"')}\",\n",
    "        \"mantra\" : \"{gen(\"mantra\", stop='\"')}\",\n",
    "        \"strength\" : {gen(\"age\", regex=\"[0-9]+\")},\n",
    "        \"quest_items\" : [{quoted_list(\"quest_items\", n_items)}]\n",
    "    }}'''\n",
    "    return lm\n",
    "\n",
    "\n",
    "generation = lm + generate_character(\"A quick and nimble fighter\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have produced valid JSON:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loaded json:\n",
      "{\n",
      "    \"description\": \"A quick and nimble fighter\",\n",
      "    \"name\": \"Sabretooth\",\n",
      "    \"age\": 25,\n",
      "    \"armour\": \"leather\",\n",
      "    \"weapon\": \"sword\",\n",
      "    \"class\": \"warrior\",\n",
      "    \"mantra\": \"Fear is my ally\",\n",
      "    \"strength\": 8,\n",
      "    \"quest_items\": [\n",
      "        \"Sabretooth's Sword of Fury\",\n",
      "        \"Leather Armour of the Wilds\",\n",
      "        \"Mantra of the Fearless Warrior\"\n",
      "    ]\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "import json\n",
    "\n",
    "gen_json = json.loads(generation.__str__())\n",
    "\n",
    "print(f\"Loaded json:\\n{json.dumps(gen_json, indent=4)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have also captured our generated text and can access it like a dictionary:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'sword'"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "generation[\"weapon\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using a schema\n",
    "\n",
    "We can also define a JSON-schema for our character, and then pass that to `guidance`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "character_schema = \"\"\"{\n",
    "    \"type\": \"object\",\n",
    "    \"properties\": {\n",
    "        \"description\" : { \"type\" : \"string\", \"maxLength\" : 100 },\n",
    "        \"name\" : { \"type\" : \"string\" },\n",
    "        \"age\" : { \"type\" : \"integer\", \"exclusiveMinimum\" : 18, \"maximum\" : 200 },\n",
    "        \"armour\" : { \"type\" : \"string\", \"enum\" : [\"leather\", \"chainmail\", \"plate\"] },\n",
    "        \"weapon\" : { \"type\" : \"string\", \"enum\" : [\"sword\", \"axe\", \"mace\", \"spear\", \"bow\", \"crossbow\"] },\n",
    "        \"class\" : { \"type\" : \"string\" },\n",
    "        \"mantra\" : { \"type\" : \"string\", \"maxLength\" : 180 },\n",
    "        \"strength\" : { \"type\" : \"integer\", \"exclusiveMinimum\" : 0, \"maximum\" : 20 },\n",
    "        \"quest_items\" : { \"type\" : \"array\", \"items\" : { \"type\" : \"string\", \"maxLength\" : 32 }, \"maxItems\" : 4 }\n",
    "    },\n",
    "    \"required\": [ \"description\", \"name\", \"age\", \"armour\", \"weapon\", \"class\", \"mantra\", \"strength\", \"quest_items\" ],\n",
    "    \"additionalProperties\": false\n",
    "}\n",
    "\"\"\"\n",
    "\n",
    "character_schema_obj = json.loads(character_schema)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our previous generation complies with this schema:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "from jsonschema import validate\n",
    "\n",
    "validate(instance=gen_json, schema=character_schema_obj)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, use our schema with `guidance`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "063491e5407d4517b28c25e38822b55b",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "StitchWidget(initial_height='auto', initial_width='100%', srcdoc='<!doctype html>\\n<html lang=\"en\">\\n<head>\\n …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from guidance import json as gen_json\n",
    "\n",
    "generated = lm + \"A character attuned to the forest\"\n",
    "generated += gen_json(schema=character_schema_obj, name=\"next_character\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, we have a valid JSON result:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"description\": \"A mystical being that embodies the spirit of the forest, with the ability to communicate with plants\",\n",
      "    \"name\": \"Thalorien\",\n",
      "    \"age\": 50,\n",
      "    \"armour\": \"leather\",\n",
      "    \"weapon\": \"axe\",\n",
      "    \"class\": \"druid\",\n",
      "    \"mantra\": \"Nature's harmony, life's balance\",\n",
      "    \"strength\": 8,\n",
      "    \"quest_items\": [\n",
      "        \"Ancient Oak Seed\",\n",
      "        \"Moonlit Blossom\",\n",
      "        \"Elderberry Potion\"\n",
      "    ]\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "loaded_character = json.loads(generated[\"next_character\"])\n",
    "\n",
    "validate(instance=loaded_character, schema=character_schema_obj)\n",
    "\n",
    "print(json.dumps(loaded_character, indent=4))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr style=\"height: 1px; opacity: 0.5; border: none; background: #cccccc;\">\n",
    "<div style=\"text-align: center; opacity: 0.5\">Have an idea for more helpful examples? Pull requests that add to this documentation notebook are encouraged!</div>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
